摘要 :
In this paper, we present a comprehensive study of full-chip power, performance, and area metric for monolithic 3D (M3D) IC designs at the 7nm technology node. We investigate the benefits of M3D designs using our predictive 7nm Fi...
展开
In this paper, we present a comprehensive study of full-chip power, performance, and area metric for monolithic 3D (M3D) IC designs at the 7nm technology node. We investigate the benefits of M3D designs using our predictive 7nm FinFET libraries. This paper outlines detailed iso-performance power comparisons between M3D and 2D full-chip GDSII designs using both 7nm high performance (HP) and low stand-by power (LSTP) library cells. We achieve significant wire-length and buffer reduction with 7nm HP M3D designs over 2D counterparts, thus more power saving at high iso-performance frequency. In addition, this power saving is also realized in 7nm LSTP M3D designs running at low iso-performance frequencies. We also study the impact of clock tree design on the clock power consumption in M3D designs. Lastly, we demonstrate the impact of clock tree partitioning on the total power of full-chip M3D designs. Our experiments show that 7nm HP and LSTP M3D designs outperform its 2D counterparts by 12% and 10% on average, respectively.
收起
摘要 :
In this paper we study power, performance, and cost (PPC) tradeoffs for 2-tier, gate-level, full-chip GDS monolithic 3D ICs (M3D) built using a foundry-grade 7nm bulk FinFET technology. We first develop highly-accurate wafer and d...
展开
In this paper we study power, performance, and cost (PPC) tradeoffs for 2-tier, gate-level, full-chip GDS monolithic 3D ICs (M3D) built using a foundry-grade 7nm bulk FinFET technology. We first develop highly-accurate wafer and die cost models for 2D and M3D to study PPC tradeoffs. In our study, both 2D and M3D designs are optimized in terms of the number of BEOL metal layers used for routing to obtain the best possible PPC values. We develop a new CAD methodology for 2-tier gate-level M3D, named Projected 2D Flow, that allows us to accurately compare RC parasitics of equivalent nets in both 2D and M3D designs. Our experiments based on two different circuit types (BEOL-dominant vs. FEOL-dominant) confirm that M3D designs indeed offer a significant footprint saving. However, to our surprise, the PPC quality of M3D turns out to be worse than that of 2D by 34% due to the high wafer cost of M3D. Our study also reveals that M3D wafer yield should be as high as 90% of 2D wafer yield, and the M3D device manufacturing cost should be less than 33% of that of 2D to justify the adoption of M3D technology at the 7nm era. Lastly, and counter-intuitively, our study shows that FEOL-dominant circuit shows more PPC benefits from M3D technology than BEOL-dominant circuit.
收起
摘要 :
In this paper we study power, performance, and cost (PPC) tradeoffs for 2-tier, gate-level, full-chip GDS monolithic 3D ICs (M3D) built using a foundry-grade 7nm bulk FinFET technology. We first develop highly-accurate wafer and d...
展开
In this paper we study power, performance, and cost (PPC) tradeoffs for 2-tier, gate-level, full-chip GDS monolithic 3D ICs (M3D) built using a foundry-grade 7nm bulk FinFET technology. We first develop highly-accurate wafer and die cost models for 2D and M3D to study PPC tradeoffs. In our study, both 2D and M3D designs are optimized in terms of the number of BEOL metal layers used for routing to obtain the best possible PPC values. We develop a new CAD methodology for 2-tier gate-level M3D, named Projected 2D Flow, that allows us to accurately compare RC parasitics of equivalent nets in both 2D and M3D designs. Our experiments based on two different circuit types (BEOL-dominant vs. FEOL-dominant) confirm that M3D designs indeed offer a significant footprint saving. However, to our surprise, the PPC quality of M3D turns out to be worse than that of 2D by 34% due to the high wafer cost of M3D. Our study also reveals that M3D wafer yield should be as high as 90% of 2D wafer yield, and the M3D device manufacturing cost should be less than 33% of that of 2D to justify the adoption of M3D technology at the 7nm era. Lastly, and counter-intuitively, our study shows that FEOL-dominant circuit shows more PPC benefits from M3D technology than BEOL-dominant circuit.
收起
摘要 :
Monolithic 3D integration provides massive vertical integration through the use of nanoscale inter-layer vias (ILVs). However, high integration density and aggressive scaling of the inter-layer dielectric make ILVs especially pron...
展开
Monolithic 3D integration provides massive vertical integration through the use of nanoscale inter-layer vias (ILVs). However, high integration density and aggressive scaling of the inter-layer dielectric make ILVs especially prone to defects. We present a low-cost built-in self-test (BIST) method to detect opens, stuck-at faults (SAFs), and bridging faults (shorts) in ILVs. Two test patterns-all-1s and all-0s-are applied to the input side of a set of ILVs (e.g., making up a bus between two tiers). On the adjacent tier (the output side of the ILVs), the test responses are compacted to a 2-bit signature through space compaction. We prove that this compaction solution does not introduce any fault aliasing. Simulations results using HSPICE and M3D benchmark designs show that the proposed BIST method requires low area overhead and test time, but provides effective fault localization and the detectability of a wide range of resistive faults.
收起
摘要 :
Monolithic 3D integration provides massive vertical integration through the use of nanoscale inter-layer vias (ILVs). However, high integration density and aggressive scaling of the inter-layer dielectric make ILVs especially pron...
展开
Monolithic 3D integration provides massive vertical integration through the use of nanoscale inter-layer vias (ILVs). However, high integration density and aggressive scaling of the inter-layer dielectric make ILVs especially prone to defects. We present a low-cost built-in self-test (BIST) method to detect opens, stuck-at faults (SAFs), and bridging faults (shorts) in ILVs. Two test patterns-all-1s and all-0s-are applied to the input side of a set of ILVs (e.g., making up a bus between two tiers). On the adjacent tier (the output side of the ILVs), the test responses are compacted to a 2-bit signature through space compaction. We prove that this compaction solution does not introduce any fault aliasing. Simulations results using HSPICE and M3D benchmark designs show that the proposed BIST method requires low area overhead and test time, but provides effective fault localization and the detectability of a wide range of resistive faults.
收起
摘要 :
A liquid state machine (LSM) is a powerful recurrent spiking neural network shown to be effective in various learning tasks including speech recognition. In this work, we investigate design and architectural co-optimization to fur...
展开
A liquid state machine (LSM) is a powerful recurrent spiking neural network shown to be effective in various learning tasks including speech recognition. In this work, we investigate design and architectural co-optimization to further improve the area-energy efficiency of LSM-based speech recognition processors with monolithic 3D IC (M3D) technology. We conduct fine-grained tier partitioning, where individual neurons are folded, and explore the impact of shared memory architecture and synaptic model complexity on the power-performance-area-accuracy (PPA) benefit of M3D LSM-based speech recognition. In training and classification tasks using spoken English letters, we obtain up to 70.0\% PPAA savings over 2D ICs.
收起
摘要 :
As small-form-factor and low-power end devices matter in the cloud networking and Internet-of-Things Era, the bio-inspired neuromorphic architectures attract great attention recently in the hope of reaching the energy-efficiency o...
展开
As small-form-factor and low-power end devices matter in the cloud networking and Internet-of-Things Era, the bio-inspired neuromorphic architectures attract great attention recently in the hope of reaching the energy-efficiency of brain functions. Out of promising solutions, a liquid state machine (LSM), that consists of randomly and recurrently connected reservoir neurons and trainable readout neurons, has shown a great promise in delivering brain-inspired computing power. In this work, we adopt the state-of-the-art face-to-face (F2F)-bonded 3D IC flow named Compact-2D [4] to the LSM processor design, and study the power-area-accuracy benefits of 3D LSM ICs targeting the next generation commercial-grade neuromorphic computing platforms. First, we analyze how the different size and connection density of a reservoir in the LSM architecture affects the learning performance using the real-world speech recognition benchmark. Also, we explore how much the power-area design overhead should be paid off to enable better classification accuracy. Based on the power-area-accuracy trade-off, we implement a F2F-bonded 3D LSM IC using the optimal LSM architecture, and finally justify that 3D integration practically benefits the LSM processor design in huge form factor and power savings while preserving the best learning performance.
收起
摘要 :
As small-form-factor and low-power end devices matter in the cloud networking and Internet-of-Things Era, the bio-inspired neuromorphic architectures attract great attention recently in the hope of reaching the energy-efficiency o...
展开
As small-form-factor and low-power end devices matter in the cloud networking and Internet-of-Things Era, the bio-inspired neuromorphic architectures attract great attention recently in the hope of reaching the energy-efficiency of brain functions. Out of promising solutions, a liquid state machine (LSM), that consists of randomly and recurrently connected reservoir neurons and trainable readout neurons, has shown a great promise in delivering brain-inspired computing power. In this work, we adopt the state-of-the-art face-to-face (F2F)-bonded 3D IC flow named Compact-2D [4] to the LSM processor design, and study the power-area-accuracy benefits of 3D LSM ICs targeting the next generation commercial-grade neuromorphic computing platforms. First, we analyze how the different size and connection density of a reservoir in the LSM architecture affects the learning performance using the real-world speech recognition benchmark. Also, we explore how much the power-area design overhead should be paid off to enable better classification accuracy. Based on the power-area-accuracy trade-off, we implement a F2F-bonded 3D LSM IC using the optimal LSM architecture, and finally justify that 3D integration practically benefits the LSM processor design in huge form factor and power savings while preserving the best learning performance.
收起
摘要 :
A liquid state machine (LSM) is a powerful recurrent spiking neural network shown to be effective in various learning tasks including speech recognition. In this work, we investigate design and architectural co-optimization to fur...
展开
A liquid state machine (LSM) is a powerful recurrent spiking neural network shown to be effective in various learning tasks including speech recognition. In this work, we investigate design and architectural co-optimization to further improve the area-energy efficiency of LSM-based speech recognition processors with monolithic 3D IC (M3D) technology. We conduct fine-grained tier partitioning, where individual neurons are folded, and explore the impact of shared memory architecture and synaptic model complexity on the power-performance-area-accuracy (PPA) benefit of M3D LSM-based speech recognition. In training and classification tasks using spoken English letters, we obtain up to 70.0\% PPAA savings over 2D ICs.
收起
摘要 :
Neuromorphic systems consist of a framework of spiking neurons interconnected via plastic synaptic junctures. The discovery of a two terminal passive nanoscale memristive device has spurred great interest in the realization of mem...
展开
Neuromorphic systems consist of a framework of spiking neurons interconnected via plastic synaptic junctures. The discovery of a two terminal passive nanoscale memristive device has spurred great interest in the realization of memristive plastic synapses in neural networks. In this work, a synapse structure is presented that utilizes a pair of memristors, to implement both positive and negative weights. The working scheme of this synapse as an electrical interlink between neurons is explained, and the relative timing of their spiking events is analyzed, which leads to a modulation of the synaptic weight in accordance with the spike-timing-dependent plasticity (STDP) rule. A digital pulse width modulation technique is proposed to achieve these variable changes to the synaptic weight. The synapse architecture presented is shown to have high accuracy when used in neural networks for classification tasks. Lastly, the energy requirement of the system during various phases of operation is presented.
收起